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24 Abstract 

25 Coronaviruses recently emerged as major human pathogens causing outbreaks 

26 of severe acute respiratory syndrome and Middle-East respiratory syndrome. 

27 They utilize the spike (S) glycoprotein anchored in the viral envelope to mediate 

28 host attachment and fusion of the viral and cellular membranes to initiate 

29 infection. The S protein is a major determinant of the zoonotic potential of 

30 coronaviruses and is also the main target of the host humoral immune response. 

31 We report here the 3.5 A resolution cryo-electron microscopy structure of the S 

32 glycoprotein trimer from the pathogenic porcine deltacoronavirus (PDCoV), which 

33 belongs to the recently identified delta genus. Structural and glycoproteomics 

34 data indicate that the glycans of PDCoV S are topologically conserved when 

35 compared with the human respiratory coronavirus HCoV-NL63 S, resulting in 

36 similar surface areas being shielded from neutralizing antibodies and implying 

37 that both viruses are under comparable immune pressure in their respective 

38 hosts. The structure further reveals a shortened S 2 ’ activation loop, containing a 

39 reduced number of basic amino acids, which participates to rendering the spike 

40 largely protease-resistant. This property distinguishes PDCoV S from recently 

41 characterized betacoronavirus S proteins and suggests that the S protein of 

42 enterotropic PDCoV has evolved to tolerate the protease-rich environment of the 

43 small intestine and to fine-tune its fusion activation to avoid premature triggering 

44 and reduction of infectivity. 

45 
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47 

48 Importance 

49 Coronaviruses use transmembrane spike (S) glycoprotein trimers to promote host 

50 attachment and fusion of the viral and cellular membranes. We determined a near- 

51 atomic resolution cryo-electron microscopy structure of the S ectodomain trimer from 

52 the pathogenic porcine deltacoronavirus (PDCoV), which is responsible for diarrhea in 

53 piglets and has had devastating consequences for the swine industry worldwide. 

54 Structural and glycoproteomics data reveal that PDCoV S is decorated with 78 N-linked 

55 glycans obstructing the protein surface to limit accessibility to neutralizing antibodies in 

56 a way reminiscent of what has recently been described for a human respiratory 

57 coronavirus. PDCoV S is largely protease-resistant which distinguishes it from most 

58 other characterized coronavirus S glycoproteins and suggests that enteric 

59 coronaviruses have evolved to fine-tune fusion activation in the protease-rich 

60 environment of the small intestine of infected hosts. 

61 
62 

63 
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70 

71 Introduction 

72 Coronaviruses are large enveloped viruses, with single-stranded positive-sense RNA 

73 genomes, classified in four genera (a, (3, y, and 5) based on their sequence similarity. 

74 Most recognized coronaviruses are animal viruses but four coronaviruses, namely 

75 HCoV-229E, HCoV-OC43, HCoV-NL63 and HCoV-HKUl, are known to continuously 

76 circulate in the human population and are associated with up to 30% of respiratory tract 

77 infections(l). In addition, severe acute respiratory syndrome (SARS-CoV) and Middle- 

78 East respiratory syndrome (MERS-CoV) coronaviruses are zoonotic viruses causing 

79 deadly pneumonia in humans(2). SARS-CoV and MERS-CoV have resulted in more 

80 than 8,000 and 2,000 cases with fatality rates of 10 and 35%, respectively. No specific 

81 antiviral treatments or vaccines are approved for human coronaviruses and zoonosis 

82 remains a great pandemic threat. 

83 

84 The ability to recognize the appropriate receptor and to efficiently enter host cells are 

85 key requirements for cross-species spillover of zoonotic viruses such as influenza(3). 

86 For coronaviruses, these two functions are carried out by the spike (S) glycoprotein. 

87 Therefore, structural and functional studies of S glycoproteins can provide invaluable 

88 information to evaluate the cross-species transmission potential of these viruses. The 

89 coronavirus S protein is a class I viral fusion protein that forms homotrimers decorating 

90 the viral envelope. It is composed of an N-terminal Si subunit, responsible for receptor- 

91 binding, and a C-terminal S 2 subunit, which contains the fusion machinery. The 

92 combined activities of the two subunits promote coronavirus attachment to host cells 
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93 and subsequent fusion of the viral and cellular membranes, via irreversible 

94 conformational changes, initiating viral infection. Since it is the major surface protein, S 

95 is also the main target of neutralizing antibodies during infection and a focus of vaccine 

96 design. 

97 

98 The zoonotic potential of coronaviruses is determined by the receptor-binding properties 

99 of the S protein. For instance, SARS-CoV and MERS-CoV bind with high-affinity to their 

100 cognate human receptors, angiotensin-converting enzyme 2 (ACE2) and dipeptidyl 

101 peptidase 4 (DPP4), respectively(4, 5). Metagenomic data revealed that many MERS- 

102 CoV and SARS-CoV-like viruses exist in bats and one such virus, WIV-1, isolated from 

103 bat feces, shares 99.9% nucleotide sequence identity with SARS-CoV. The S protein 

104 encoded by WIV-1 binds human, bat and civet ACE2 orthologues allowing the virus to 

105 efficiently infect human cells expressing any of these three orthologues(6, 7). Similarly, 

106 HKU4-CoV and HKU5-CoV that are closely related to MERS-CoV have been identified 

107 in bats and PIKU4-CoV can be adapted to bind human DPP4 by substituting three 

108 amino acids in the S receptor-binding domain(8, 9). 

109 

110 The zoonotic potential of coronaviruses is further determined by fusion activation which 

111 requires S processing by host proteases. Up to two cleavage sites are present in S 

112 glycoproteins: a site found at the boundary between the Si and S 2 subunits of some 

113 coronavirus S (the Si/S 2 site) and a conserved site upstream from the fusion peptide 

114 (the S 2 ’ site)(10). 

115 
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116 For a subset of coronaviruses, such as MHV, SARS-CoV and MERS-CoV, the S 

117 glycoprotein is cleaved at the Si/S 2 junction during biogenesis and viral egress(10-13). 

118 This proteolytic event, along with subsequent binding to the host receptor, enhances 

119 processing at the S 2 ’ site and participates in MERS-CoV or SARS-CoV fusion 

120 activation(11, 13). Moreover, substitution of two residues at the boundary between the 

121 Si and S 2 subunits enables efficient processing by human proteases and allows the bat- 

122 infecting HKU4-CoV S protein to mediate entry into human cells(14). 

123 

124 Proteolysis at the conserved S 2 ’ site is essential for fusion activation of all characterized 

125 coronavirus S proteins and it can occur at the host membrane or in internal cellular 

126 compartments. For instance, transmembrane protease/serine protease (TMPRSS) 

127 processing of SARS-CoV and MERS-CoV S at the cell membrane, furin-mediated 

128 processing of HCoV-NL63 and MERS-CoV S in the early endosomes, or endo- 

129 lysosomal protease-mediated triggering of SARS-CoV S (by cathepsin L) and MFIV S 

130 are key events orchestrating spatial and temporal activation of fusion to ensure 

131 successful viral entry into host cells(12, 13, 15). Alternatively, porcine epidemic diarrhea 

132 coronavirus (PEDV), which replicates in the epithelial cells of the small intestine, 

133 undergoes S proteolytic activation by trypsin, which is highly abundant in the lumen of 

134 this organ(16). These examples illustrate how the availability of host proteases and the 

135 mechanism of proteolytic activation can directly restrict coronavirus activation, viral 

136 tropism, and pathogenesis. 

137 
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138 One common pattern shared by both SARS and MERS outbreaks is that although they 

139 both originated in bats, an intermediate host with closer physical proximity to humans 

140 allowed for more efficient cross-species transmission. Palm civets and camels were the 

141 most probable intermediate hosts for SARS-CoV and MERS-CoV, respectively(7, 17, 

142 18). Due to their proximity with humans, pigs also acted as intermediate hosts for the 

143 influenza pandemic (19) and for the emergence of Nipah virus in Malaysia(20). To date, 

144 only a- and (3-coronaviruses have been implicated in human diseases and several S 

145 glycoproteins from viruses belonging to these two genera have been structurally 

146 characterized(21-26). To the best of our knowledge, no porcine coronaviruses have 

147 crossed the species barrier to infect humans, and their receptor usage appears to favor 

148 porcine orthologues. Porcine epidemic diarrhea virus (PEDV), however, can infect pig, 

149 human, monkey and bat cells, suggesting it has the potential to spillover to species 

150 other than pig(27). As a result, cross-species transmission of coronaviruses poses an 

151 imminent and long-term threat to human health which emphasizes the need for 

152 surveying and studying these viruses to prevent and control infections. 

153 

154 The recently emerged porcine deltacoronavirus (PDCoV) is responsible for diarrhea in 

155 piglets and has had devastating consequences for the swine industry worldwide(28, 29). 

156 No vaccines or treatments are available for PDCoV. Here, we report the cryoEM 

157 structure of the PDCoV S trimer revealing that it has a molecular architecture most 

158 closely related to the S glycoproteins of the a-genus of coronaviruses. Integrating 

159 structural and glycoproteomics data, we discovered that PDCoV S masks potential 

160 epitopes with glycans in a way reminiscent of the human respiratory a-coronavirus 
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161 HCoV-NL63 S glycoprotein(22). These results support a relatedness between a- and 5- 

162 coronavirus S glycoproteins and suggest that the immune system of infected hosts exert 

163 comparable selection pressure on these viruses which has led to these adaptations. 

164 The structure also reveals the C-terminal S 2 fusion machinery of the PDCoV S protein 

165 features a short S 2 ’ activation loop which appears to be largely resistant to proteolysis 

166 by trypsin/chymotrypsin. We conclude that PDCoV has evolved to be highly adapted to 

167 the protease-rich environment of the enteric tract to ensure proper spatial and temporal 

168 activation of fusion and prevent premature triggering which would significantly impact 

169 virus infectivity. 

170 

171 Results 

172 Structure determination of the PDCoV S glycoprotein 

173 PDCoV was first identified in Hong Kong in 2012(29) and it has since spread rapidly in 

174 the swine population across the globe(28, 29). Due to its recent emergence, relatively 

175 little is known about this virus compared to other swine coronaviruses. One feature that 

176 distinguishes PDCoV from other known coronaviruses is that it encodes one of the 

177 smallest S glycoproteins. We therefore set out to explore the architectural diversity of S 

178 proteins across coronavirus genera to understand shared and unique features of the 

179 structurally uncharacterized 5-genus. 

180 

181 We used Drosophila S2 cells to produce the PDCoV/USA/lllinoisI21/2014 S 

182 ectodomain (residues 1-1098) with a C-terminal fusion adding a GCN4 trimerization 

183 motif and a strep-tag(30). Following sample vitrification by triple blotting(31), data were 
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184 acquired on an FEI Titan Krios electron microscope equipped with a Gatan Quantum 

185 GIF energy filter operated in zero-loss mode and a Gatan K2 Summit electron-counting 

186 camera operated in super-resolution mode (Fig 1A-B). We determined a 3D 

187 reconstruction at 3.5 A resolution resolving most amino acid side chains, disulphide 

188 bonds and N-linked glycans (Fig SI A). These features were used as fiducials to confirm 

189 the sequence register during model building (Fig 1C-F and SI B-E Fig). Starting from the 

190 HCoV-NL63 S structure(22), we obtained an atomic model of the PDCoV S trimer using 

191 manual modeling in Coot(32) and Rosetta density-guided iterative refinement(33). The 

192 final model comprises residue 52 to 1021 and 21 N-linked glycans (Table 1). 

193 

194 The PDCoV S protein assembles as a compact trimer with a height of -145 A and a 

195 width of 115 A (Fig 1C-D). The Si subunit has a modular organization comprising four 

196 distinct domains, designated A, B, C and D, whereas the S 2 subunit adopts a mostly- 

197 helical elongated architecture with a connector domain appended to its C-terminal 

198 end(21,22) (Fig 1E-F). 

199 

200 The extensive PDCoV S glycan shield 

201 The unsharpened PDCoV S map resolves 21 N-linked glycans for each protomer that 

202 form prominent protrusions extending from the protein surface (Fig 2A-B and Fig SI F- 

203 G). Using on-line reversed phased liquid chromatography with electron transfer/high- 

204 energy collision-dissociation tandem mass-spectrometry(34), we detected 16 N-linked 

205 glycosylation sites corresponding to those observed in the cryoEM map and confirmed 5 

206 additional sites located in the structurally unresolved N and C-terminal parts of the 
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207 protein (Fig 2C and Table SI). Combining our structural and mass-spectrometry data, 

208 we found evidence for glycosylation at 26 out of 27 possible NXS/T glycosylation 

209 sequons. The intact glycopeptides detected by MS/MS for PDCoV S expressed in 

210 Drosophila S2 cells corresponded mostly to paucimannosidic glycans containing 3 

211 mannose residues (with or without core fucosylation) and oligomannose glycans 

212 containing 4 to 9 mannose residues. We also detected complex glycans (with or without 

213 core fucosylation), which appears compatible with the accessibility and crowding of 

214 these carbohydrate chains that would permit processing(35, 36). 

215 

216 Overall, the glycan coverage of PDCoV S is dense and extensively decorates the 

217 accessible surface of the trimer. Although we detected substantially more N-linked 

218 glycans for HCoV-NL63 S(22) (34 sites per protomer), 6 validated glycans reside within 

219 the N-terminal domain 0, which is absent in PDCoV S and explains most of the 

220 discrepancy in the number of sites. Strikingly, numerous glycans identified in the 

221 PDCoV S structure overlap with glycans in the HCoV-NL63 S protein, either strictly or 

222 topologically, with most differences towards the viral membrane distal end of the 

223 molecule (Fig 2D-E). Transmission of zoonotic viruses into humans can result in drastic 

224 changes in glycosylation, as exemplified by the human influenza H3 hemagglutinin that 

225 has doubled its number of glycosylation sites since the 1968 pandemic although its 

226 amino acid sequence remains -88% identical(37). There is considerable sequence 

227 divergence between the HCoV-NL63 and PDCoV S glycoproteins, which share 43% 

228 amino acid sequence identity. The observation that numerous glycosylation sites are 

229 conserved between the two proteins suggest that a- and 5-coronaviruses could face 
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230 similar immune pressure in their respective hosts, and that the areas that are masked 

231 by the conserved glycans are key to the function of these S glycoproteins. Based on the 

232 information gained from the HCoV-NL63 S structure(22), for which glycans appears to 

233 contribute to masking the receptor-binding loops from antibody recognition, we suggest 

234 that the glycan shield of PDCoV S and other coronavirus S glycoproteins could assist in 

235 immune evasion similarly to the well-characterized HIV-1 envelope trimer(35, 36). 

236 

237 Finally, coronavirus S glycans have previously been proposed to participate in host cell 

238 entry(38), since L-SIGN lectin can be used as an alternative receptor by SARS-CoV(39) 

239 and HCoV-229E(40), and it is conceivable they play a similar role for other S proteins. 

240 

241 Architecture of the Si receptor-binding subunit 

242 The PDCoV and HCoV-NL63 Si subunits exhibit strikingly similar structures 

243 (r.m.s.d-2.7 A over 448 aligned Ca positions), except for the absence of the N-terminal 

244 domain 0 in the former glycoprotein (Fig 3A). Deletion of domain 0, which is responsible 

245 for attachment to sialoglycans, in the porcine transmissible gastro-enteritis virus (TGEV) 

246 S gene, gave rise to porcine respiratory coronavirus (PRCV) and in turn resulted in a 

247 loss of enteric tropism(41, 42). PDCoV and HCoV-NL63, however, exhibit opposite 

248 behavior, as they target the enteric or the respiratory tracts despite the absence or 

249 presence of a domain 0 in their S glycoproteins, respectively. We describe below the 

250 functionally-relevant similarities and differences detected in the PDCoV S structure 

251 relative to other coronavirus S structures. 

252 
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253 Domain A is located at the viral membrane distal side and account for a large part of the 

254 exposed surface area of the Si subunit. It folds as a galectin-like (3-sandwich 

255 supplemented with a helix on the viral membrane distal side and a three-stranded 

256 antiparallel p-sheet plus a helix on the proximal side. The domain A surface is heavily 

257 glycosylated and features 7 glycans for PDCoV (Fig 3D). We previously reported that 

258 the HCoV-NL63 S glycan linked to Asn358 (domain A) points towards the receptor- 

259 binding domain B, masking residues involved in receptor recognition. A marked 

260 difference between the A domains of PDCoV S and HCoV-NL63 S is that the (3-hairpin 

261 harboring Asn358 in HCoV-NL63 S features a deletion of 10 residues significantly 

262 shortening it in PDCoV S (Fig 3A-C). Moreover, the topologically equivalent glycan 

263 linked to residue Asn-184 of PDCoV S is protruding away from domain B and does not 

264 significantly cover it, in contrast to what was observed for PICoV-NL63 (Fig 3A-C). 

265 

266 OC43, FIKU1 and bovine coronavirus (BCoV) are known to use 9-O-acetyl-sialylated 

267 cellular receptors for attachment to host cells(43, 44). Structural and biochemical 

268 studies showed that domain A mediates these interactions and mapped key residues 

269 involved(25, 45) and nanoparticle-displayed multimeric OC43 Si subunit exhibited high 

270 hemagglutination titer (Fig. 3 F). Comparison of the PDCoV, HKU1 and BCoV domain A 

271 structures indicated PDCoV cannot interact with 9-O-acetyl-sialoglycans in a similar way 

272 due to the absence of the strictly conserved residues involved in binding (BCoV Tyr162, 

273 Glu182, Trp184 and Hisl85) and of the loops delineating the binding cavity (Fig. 3 D-E). 

274 In line with this observation, isolated or nanoparticle-displayed multimeric PDCoV Si 

275 subunit failed to interact with sialic acid using an erythrocyte hemagglutination assay 

12 


D 

O 


Downloaded from http://jvi.asm.org/ on November 3, 2017 by UNIV OF NEWCASTLE 


cfWology JVI Journal of Virology Accepted MdnUSCfTpf Posted OfTlme 


276 (Fig. 3 F), indicating that sialic acid (or at least the types of sialoglycans displayed on 

277 these erythrocytes) does not participate in PDCoV S attachment to host cells. 

278 

279 Domain B folds as a p-sandwich reminiscent of the equivalent domain of a- 

280 coronaviruses such as PRCV (r.m.s.d.= 2.1 A over 108 aligned Ca positions), HCoV- 

281 NL63 (r.m.s.d.= 1.9 A over 107 aligned Ca positions) and TGEV (r.m.s.d.= 3.0 A over 

282 109 aligned Ca positions) (Fig 4A-D)(22, 46, 47). The two PDCoV S glycosylation sites 

283 identified at Asn-311 and Asn-331 in domain B are topologically or strictly conserved 

284 with the HCoV-NL63 S glycans linked to Asn-486 and Asn-512, respectively. PRCV and 

285 TGEV B domains also feature topologically similar glycosylation sites on the solvent 

286 exposed surface of the p-sandwich and these glycans are likely to limit the immune 

287 response against this domain which is known to be the target of neutralizing antibodies 

288 for several coronaviruses(47-52). The glycan linked to Asn-506 in HCoV-NL63 S is 

289 absent in PDCoV S for which the equivalent residue is Ser-325 (Fig 4A and C). Since 

290 masking of receptor-binding residues has been suggested to assist HCoV-NL63 

291 immune evasion(22), the reduced overall glycan coverage of PDCoV domain B could 

292 result from weaker immune pressure directed at the receptor-binding region in pigs 

293 compared to HCoV-NL63 S in humans. 

294 

295 Previous work showed that the loops located at the viral membrane distal end of the p- 

296 sandwich of domain B in a-coronavirus S glycoproteins are responsible for binding to 

297 diverse host receptors such as ACE2 (HCoV-NL63)(46) or pAPN (PRCV/TGEV)(47). 

298 Although the distal loops are significantly shorter for PDCoV compared to these two a- 
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299 coronaviruses, loop 1 and loop 3 contain several aromatic residues (Fig 4A-D). Since 

300 aromatic residues in these loops have been shown to directly participate in receptor- 

301 binding for HCoV-NL63, PRCV and TGEV, we speculate that they could also mediate 

302 interactions of the PDCoV B domain with its receptor. As is the case for HCoV-NL63 S, 

303 the PDCoV B domain has an opposite orientation, related by a -180° rotation, to the 

304 equivalent domain of p-coronavirus S glycoproteins(21). This results in burying the 

305 distal loops of the p-sandwich through interactions with domain A belonging to the same 

306 protomer and in turn restrain the availability of the putative receptor-binding motif to 

307 interact with the receptor (Fig 3B-C). As a result, it is likely that PDCoV and HCoV-NL63 

308 S glycoproteins can undergo similar conformational changes to those described for 

309 domain B of SARS-CoV and MERS-CoV S to interact with their cognate receptors(23, 

310 24, 26). A major difference, however, is that p-coronavirus S using domain B as 

311 receptor-binding domain appear to spontaneously undergo these rearrangements 

312 whereas a- and 5-coronavirus S do not and rely on a yet unidentified stimulus(22). 

313 

314 Organization of the S 2 fusion machinery 

315 The C-terminal S 2 subunit trimer fuses the viral and cellular membranes at the onset of 

316 infection and is the most conserved region among coronavirus S glycoproteins. The 

317 PDCoV S 2 subunit is structurally similar to a- and p-coronavirus S 2 subunits such as 

318 HCoV-NL63(22) (r.m.s.d.= 1.7 A over 413 aligned Ca positions) and MHV(21) (r.m.s.d.= 

319 2.2 A over 291 aligned Ca positions), respectively. The conserved architecture of the S 2 

320 fusion machinery across multiple genera highlights that coronaviruses rely on a 

321 common fusion mechanism to enter host cells(53). Despite these striking similarities, 
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322 coronavirus fusion machineries exhibit differences with key functional implications for 

323 their activation mechanism and potential for zoonotic spillover. 

324 

325 The S 2 ‘ activation loop, which connects the upstream helix to the fusion peptide and 

326 regulates the spatial and temporal activation of fusion, is resolved in the PDCoV S 

327 cryoEM map (Fig 5A-B), as was the case for the HCoV-NL63 S(22) and SARS-CoV S 

328 structures(23, 24). However, the PDCoV S 2 ’ loop is short (LTTRIGGR) and comprises 6 

329 and 3 fewer residues than the HCoV-NL63 S (LPQRNIRSSRIAGR) and SARS-CoV S 

330 (ILPDPLKPTKR) counterparts, respectively. In addition, the S 2 ’ loop of these viruses 

331 contains multiple positive charges, including two putative furin cleavage sites for HCoV- 

332 NL63 S, whereas the PDCoV S 2 ’ loop harbors a single positively charged residue (Arg- 

333 669) in addition to the conserved Arg-673 residue (Fig 5A-B). These structural features 

334 allow rationalizing the known protease requirements for fusion activation of the HCoV- 

335 NL63 S glycoprotein, which is preferentially cleaved by furin in the endosomes, and of 

336 the SARS-CoV S glycoprotein, which is preferentially processed by cathepsin L in the 

337 endo-lysosomes, and explain the fact that trypsin-like TMPRSS proteases can also 

338 trigger both proteins(10, 15). The paucity of positive charges in the PDCoV S 2 ’ trigger 

339 loop is in line with the requirement for trypsin or other pancreatic proteases to allow 

340 virus passaging and the fact that PDCoV is exposed to high concentration of such 

341 proteases in the enteric tract of infected pigs(28). 

342 

343 Studies on influenza hemagglutinin highlighted that glycans can modulate cleavage site 

344 accessibility to proteases and in turn influence fusion activation(54, 55). Similar 
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345 observations were drawn from comparisons between the MERS-CoV and HKU4 S 

346 glycoproteins(14). Notably, the PDCoV S glycans linked to Asn-652 and Asn-661 

347 decorate the trimer surface near the S 2 ’ trigger loop and could limit accessibility to 

348 proteases and play a role in orchestrating fusion activation (Fig 5B). These glycans are 

349 conserved with an identical structural organization in HCoV-NL63 S and may have the 

350 same putative function(22). 

351 

352 Sequence alignment of representative S glycoproteins from viruses of the four 

353 coronavirus genera show that a- and 5-coronaviruses feature a 14-residue long 

354 insertion in the heptad-repeat 1 (HR1) and in the HR2 motifs, corresponding to two 

355 heptad-repeats, compared to p-coronaviruses (S2A Fig), y-coronaviruses form a 

356 heterogeneous group comprising S glycoproteins without insertion but also S with one 

357 (BeCoV-SWI and BdCoV-HKU22) or two (TurkeyCoV-MGI 0) additional heptad-repeats 

358 in HR1 and in HR2 compared to [3-coronaviruses. The HR1 insertion is resolved in the 

359 PDCoV S cryoEM map (Fig 5C, residues 797-811) and corresponds to the addition of 

360 two helical turns (also visible in the HCoV-NL63 S structure) preceded by a loop (poorly 

361 resolved in the HCoV-NL63 S structure). This polypeptide segment is known to refold to 

362 form a central triple helical coiled-coil in the post-fusion S structure(53). The HR2 

363 insertion cannot be visualized as this region is disordered in the PDCoV S 

364 reconstruction and in all other coronavirus pre-fusion S structures. Mapping the HR1 

365 and HR2 insertions on the HCoV-NL63 S post-fusion core X-ray structure(56), however, 

366 reveals that these polypeptide segments are directly interacting within the 6-helix bundle 

367 (S2B Fig). This suggests that the strict correlation of their presence or absence in both 
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368 HR1 and HR2, along with the observation that insertions are always corresponding to 

369 an integer number of heptad repeats, is necessary to maintain the proper geometry of 

370 the fusion machinery and allow the conserved conformational changes driving 

371 membrane merger to take place with high efficiency. 

372 

373 Discussion 

374 Structural and functional studies of coronavirus S glycoproteins are key to 

375 understanding host and tissue tropism as well as the mechanisms of receptor binding 

376 and fusion activation. The data reported in this manuscript establishes a strong 

377 connection between a- and 5-coronavirus S glycoproteins. HCoV-NL63, PRCV, TGEV 

378 and PDCoV B domains fold as similar p-sandwiches that are structurally distinct from 

379 the single p-sheet observed for the equivalent domain of p-coronaviruses(4, 5, 21, 25, 

380 47). Moreover, the structures of HCoV-NL63 S and PDCoV S show that both 

381 glycoproteins share a common organization of their Si subunits in which the B domain 

382 directly interact with domain A from the same subunit to potentially limit accessibility of 

383 the receptor-binding loops to neutralizing antibodies. Sequence analysis indicates a 

384 strict correlation of the presence or absence of the HR1/HR2 insertions in the S 

385 glycoprotein sequence and an apparent evolutionary pressure restricting 

386 insertions/deletions to heptad-repeat units which we postulate to be necessary for 

387 efficient S refolding and fusion. Based on these criteria, we put forward that a- and 5- 

388 coronavirus S glycoproteins share closer evolutionary relationships with each other than 

389 they do with S of the other two coronavirus genera although insertions in HR1 and HR2 

390 have also been detected in a subset of y-coronavirus S proteins. 
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391 

392 We previously recapitulated in vitro the proteolytic activation of MHV, SARS-CoV and 

393 MERS-CoV pre-fusion S trimers, via trypsin incubation under limited proteolysis 

394 conditions, which led to spontaneous refolding into post-fusion S 2 trimers (the ground 

395 state of the fusion reaction)(53). In contrast, the PDCoV S glycoprotein remained largely 

396 uncleaved even after extended incubation times with up to 5:1 molar ratio of S to trypsin 

397 or chymotrypsin (0.1 mg/ml) (Figure 6). These results suggest that fusion activation of 

398 PDCoV S, which is believed to be promoted by trypsin(28), involves an additional step 

399 to expose the S 2 ’ cleavage site, such as the receptor-binding induced conformational 

400 changes described for MERS-CoV(13), SARS-CoV(57), MHV(58) and PEDV(16). 

401 These distinct protease sensitivities are reminiscent of the differences reported between 

402 clinical isolates (CV777) and cell-culture adapted (caDR13) PEDV strains for which 

403 infectivity strictly requires or is hampered by trypsin, respectively(16). We put forward 

404 that the S glycoprotein sequence and in turn structure of PDCoV and PEDV CV777 

405 have evolved to be resistant to pancreatic proteases to which both viruses are exposed 

406 in the enteric tract during infection. Fine-tuning of fusion activation is likely achieved by 

407 restraining access to the S 2 ’ cleavage site until receptor-binding occurs at the host cell 

408 surface. This event could promote conformational changes exposing the S 2 ’ site to allow 

409 processing by trypsin or other proteases with exquisite spatial and temporal 

410 coordination. In contrast, SARS-CoV, MERS-CoV or MHV are not expected to be 

411 exposed to pancreatic proteases during the virus life-cycle and their S glycoproteins 

412 presumably did not evolve with this selection pressure, explaining their sensitivity to 

413 trypsin and chymotrypsin. In agreement with what has been postulated for SARS- 
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414 CoV(57) and PEDV caDR13(16), trypsin sensitivity could result in premature 

415 cleavage/triggering of the pre-fusion S trimer and attenuation of infectivity and viral 

416 fitness. 

417 

418 While completing this study, another group also determined a cryoEM reconstruction of 

419 the PDCoV S glycoprotein ectodomain and both structures can be superposed with 

420 excellent agreement (r.m.s.d. of 1.1 A over 959 aligned Ca positions). 

421 
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437 A gene fragment encoding the PDCoV S ectodomain (residues 20-1098, Uniprot: 

438 W8Q9Y7) was PCR-amplified from a plasmid containing the full-length S gene. The 

439 PCR product was ligated to a gene fragment encoding a GCN4 trimerization motif 

440 (LIKRMKQIEDKIEEIESKQKKIENEIARIKKIKH 21. 59, 60), a thrombin cleavage site 

441 (LVPRGSLE), an 8-residue long Strep-Tag (WSHPQFEK) and a stop codon, as 

442 previously described(61). Subsequent cloning was performed in the pMT\BiP\V5\His 

443 expression vector (Invitrogen) in frame with the Drosophila BiP secretion signal 

444 downstream the metallothionein promoter. 

445 A human codon-optimized gene encoding for the ectodomain (residues 14-1180) of the 

446 SARS-CoV S protein (UniProt: P59594) was cloned into a modified pOPING vector(62) 

447 (Addgene) introducing a N-terminal Mu-phosphatase signal peptide and a C-terminal 

448 TEV protease cleavage site, a foldon and a hexa-histidine tag at the C-terminus of the 

449 construct. 

450 

451 Production of recombinant PDCoV S ectodomain in Drosophila S2 cells 

452 To generate a stable Drosophila S2 cell line expressing the recombinant PDCoV S 

453 ectodomain, we used Effectene (Qiagen) and 2 pg of plasmid. Puromycin N-acetyl 

454 transferase was co-transfected as dominant selectable marker. Stable PDCoV S 

455 expressing cell lines were selected by addition of 7 pg/ml Puromycin (Invitrogen) to the 

456 culture medium 48 h after transfection. For large-scale production, the cells were 

457 cultured in spinner flasks and induced by 5 pM of CdCI 2 ata density of approximately 

458 10 7 cells per ml_. After a week at 28 °C clarified cell supernatants were concentrated 40- 

459 fold using Vivaflow tangential filtration cassettes (Sartorius, 10 kDa cutoff) and adjusted 
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460 to pH 8.0, before affinity purification using StrepTactin Superflow column (IBA) followed 

461 by gel filtration chromatography using Superose 6 10/300 GL column (GE Life 

462 Sciences) equilibrated in 20 mM Tris-HCI pH 7.5 and 100 mM NaCI. The concentration 

463 of the purified protein was estimated using absorption at 280 nm. 

464 

465 Production of recombinant SARS S ectodomain in HEK293F cells 

466 Transient transfection of 250mL HEK293F cells at a density of 10 6 cells/mL was 

467 performed using 293fectin (ThermoFisher) and Optimem (ThermoFisher). After 3 days 

468 the cells were harvested before affinity purification with a Talon 5mL cobalt column 

469 equilibrated in 25mM sodium phosphate pH 8.0, 300mM NaCI, 10mM Imidazole. The 

470 purified protein was buffer exchanged into 20mM Tris pH 8.0, 150mM NaCI and 

471 concentrated to 1 .Omg/mL. 

472 

473 CryoEM specimen preparation and data collection 

474 Two microliters of purified PDCoV S at ~ 0.5 mg/mL was triple-blotted(31) using 1.2/1.3 

475 C-flat grids (Protochips), which had been glow discharged for 30 seconds at 20mA. 

476 Grids were then plunge-frozen in liquid ethane using an FEI Mark I Vitrobot with 7.5 

477 seconds blot time and an offset of -3mm at 100% humidity and 25°C. Data was 

478 collected using SerialEM automatic data collection software(63) on a FEI Titan Krios 

479 operated at 300kV and equipped with a Gatan Quantum GIF energy-filter operated in 

480 zero-loss mode with a slit width of 20 eV and a Gatan K2 Summit direct electron 

481 detector camera operated in super-resolution mode. The dose rate was adjusted to ~5 

482 counts/pixel/s and each movie was acquired in counting mode fractionated in 75 frames 
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483 of 200 ms. 2,000 micrographs were collected in a single session using a defocus range 

484 comprised between 1.5 and 4.0 pm. 

485 

486 CryoEM data processing 

487 Frame alignment was carried out using Motioncor2(64). The parameters of the 

488 microscope contrast transfer function were initially estimated using GCTF(65) and then 

489 using CTFFIND4(66). Particles were automatically picked using DoGPicker(67). Particle 

490 images were extracted and processed using Relion 2.0(68) with a box size of 640 

491 pixels 2 and a pixel size of 0.665 A. Following reference-free 2D classification, we ran 3D 

492 classification with Cl symmetry(69) using an initial model generated with 

493 e2initialmodel.py in EMAN2. 455,710 particles were selected to run a gold-standard 3D 

494 refinement imposing C3 symmetry using Relion 2.1 (70) that led to a map at 3.5 A 

495 resolution. Post processing was done using Relion to apply an automatically generated 

496 B factor of -150 A 2 . Reported resolutions are based on the gold-standard FSC=0.143 

497 criterion(70, 71) and Fourier shell correlation curves were corrected for the effects of 

498 soft masking by high-resolution noise substitution(72). The soft mask used for FSC 

499 calculation had a 10 pixel cosine edge fall-off. 

500 

501 Model building and analysis 

502 UCSF Chimera(73) was used to fit the HCoV-NL63 S structure(22) into the cryoEM map 

503 before manual rebuilding in Coot(32, 74). Glycan density coming after an NXS/T motif 

504 was initially hand built into the density where visible and glycan geometry was then 

505 refined using Rosetta, optimizing the fit-to-density as well as the energetics of 
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506 protein/glycan contacts. The final model was refined using the symmetric modeling 

507 framework in Rosetta(33, 75). The quality of the final model was analyzed with 

508 Molprobity(76) and Privateer(77). All figures were generated with UCSF Chimera(73). 

509 

510 Mass Spectrometry 

511 250 pmol of PDCoV S was incubated in a freshly prepared solution containing lOOmM 

512 Tris pH 8.5, 2% sodium deoxycholate, 10mM tris(2-carboxyethyl)phosphine, and 40mM 

513 iodoacetamide at 95 °C for five minutes followed by 25 °C for thirty minutes in the dark. 

514 80 pmol of denatured, reduced, and alkylated PDCoV S was then diluted into freshly 

515 made 50mM ammonium bicarbonate and incubated for 14 hours at 37 °C either with 

516 1:75 (w:w) of trypsin (Sigma Aldrich), or chymotrypsin (Sigma Aldrich) or alpha lytic 

517 protease (Sigma Aldrich). Formic acid was then added to a final concentration of 2% to 

518 precipitate the sodium deoxycholate in the samples, followed by centrifugation at 14,000 

519 rpm for 20 minutes. The supernatant containing the (glyco)-peptides was collected and 

520 spun again at 14,000 rpm for 5 min immediately before sample analysis. For each 

521 sample 8 pL was injected on a Thermo Scientific Orbitrap Fusion Tribrid mass 

522 spectrometer. A 35-cm analytical column and a 3-cm trap column filled with ReproSil- 

523 Pur C18AQ 5 pm (Dr. Maisch) beads were used. Nanospray LC-MS/MS was used to 

524 separate peptides over a 110-min gradient from 5% to 30% acetonitrile with 0.1% formic 

525 acid. A positive spray voltage of 2,100 was used with an ion-transfer-tube temperature 

526 of 350 °C. An electron-transfer/higher-energy collision dissociation ion-fragmentation 

527 scheme (34) was used with calibrated charge-dependent ETD parameters and 

528 supplemental higher-energy collision dissociation energy of 0.15. A resolution setting of 
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529 120,000 with an AGC target of 2 x 105 was used for MSI, and a resolution setting of 

530 30,000 with an AGC target of 1 x 105 was used for MS2. Data was searched with 

531 Protein Metrics Byonic software (78), using a small custom database of recombinant 

532 protein sequences including several coronavirus spike proteins, other viral glycoproteins 

533 and the proteases used to prepare the glycopeptides. Reverse decoy sequences were 

534 also included in the search. Specificity of the search was set to C-terminal cleavage at 

535 R/K (trypsin), F/W/Y/M/L (chymotrypsin) or T/A/S/V (alpha lytic protease) allowing up to 

536 two missed cleavages, with EThcD fragmentation (b/y- and c/z-type ions). We used a 

537 precursor mass and product mass tolerance of 12 ppm and 24 ppm respectively. 

538 Carbamidomethylation of cyteines was set as fixed modification, methionine oxidation 

539 as variable modification, and all four software-provided N-linked glycan databases were 

540 combined into a single non-redundant list used to identify glycopeptides. All 

541 glycopeptide hits were manually inspected and only those with quality peptide sequence 

542 information are reported here. 

543 

544 Proteolysis of PDCoV S and SARS S glycoproteins 

545 Proteins at a concentration of 0.5 mg/ml_ (PDCoV S) or Img/mL (SARS-CoV S) were 

546 incubated with 0.1 mg/ml_ of either trypsin (SigmaAldrich) or chymotrypsin at 22 q C for 

547 two hours. This reaction was then used for analysis by SDS-PAGE. 

548 

549 Accession number(s) 

550 The mass-spectrometry data have been deposited to PRIDE with accession 

551 code PXD007107 and includes the raw data, Byonic search results and the databases 
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552 used for protein sequences and N-linked glycan modifications. The EM map and PDB 

553 model have been deposited with accession codes EMD-7094 and 6BFU. 

554 

555 
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Figure Captions 

Fig 1. CryoEM structure of the PDCoV S protein. A, A representative micrograph of 
vitreous ice-embedded PDCoV S protein at 3.4 pm defocus. Scale bar: 510 A. B, 
Selected 2D class averages of the PDCoV S protein. Scale bar: 85 A. C-D, Side (C) and 
top (D) views of the PDCoV S cryoEM map filtered at 3.5 A resolution and sharpened 
with a B-factor of -150 A 2 . The density is colored per protomer. E-F, Ribbon 
representation of the PDCoV S trimer structure rendered with the same orientations as 
in panels C-D. One protomer is colored according to the indicated structural domains 
whereas the other two protomers are colored gray. 
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802 

803 Fig 2. Glycosylation profile of the PDCoV S protein. A-B, Two orthogonal views of 

804 the PDCoV S trimer rendered as ribbons. Glycan density extracted from the 

805 unsharpened reconstruction is colored green for one protomer and grey for the other 

806 two protomers. Labels indicate the position of N-linked glycosylated asparagine 

807 residues. C, Schematic summary of all detected N-linked glycans. Each site shows the 

808 most extensive glycan structure detected, either by mass-spectrometry or cryoEM. A full 

809 overview of all detected N-linked glycans is provided in Supplementary Table 1. Glycan 

810 moities are represented as symbols according to the key and the structural domains are 

811 individually colored and indicated in a linear representation of the PDCoV S sequence. 

812 D-E, Ribbon representation of PDCoV (D) and HCoV-NL63 (E) S protomers with 

813 glycans visualized by cryoEM shown as green spheres. 

814 

815 

816 Fig 3. Structural features of the PDCoV Si subunit and the galectin-like domain A. 

817 A, Superposition of the PDCoV and HCoV-NL63 Si subunits highlights the absence of 

818 domain 0 in PDCoV S. B, View of the interface between PDCoV S A and B domains 

819 showing the Asn-184 glycan points away from domain B. C, View of the interface 

820 between HCoV-NL63 S A and B domains showing the Asn-358 glycan contributes to 

821 masking the receptor-binding loops. D, Ribbon representation of PDCoV domain A. E, 

822 Ribbon representation of BCoV domain A oriented identically to panel (D). Highly 

823 conserved residues involved in sialic acid recognition are shown in ball and stick 

824 representation. Glycans are rendered as spheres in panels A-C or sticks in panels D-E 
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825 and colored per atom type (carbon: green, nitrogen: blue and oxygen: red). F, The 

826 PDCoV Si subunit C-terminally tagged with the Fc portion of human IgG (Si-Fc) was 

827 tested for its hemagglutination potential of an erythrocyte suspension of human or rat 

828 origin, either alone or premixed with protein A-coupled nanoparticles to increase the 

829 avidity of Si-Fc proteins for sialic acids. The sialic-acid binding Si subunit of HCoV- 

830 OC43 (GenBank: AAR01015.1) C-terminally fused to human Fc portion was used as a 

831 positive control. ‘Mock’ indicates the condition where no Si subunit was used (negative 

832 control). Wells positive for hemagglutination are encircled. 

833 

834 

835 Fig 4. Structural comparison of a- and 5-coronavirus receptor-binding domains. 

836 A-D, Ribbon rendering of the receptor-binding domain (domain B) of the 5-genus 

837 PDCoV S (A) and a-genus PRCV S (B), HCoV-NL63 S (C) and TGEV S (D). Loops that 

838 have been implicated in receptor-binding for a-coronaviruses are indicated. Key 

839 aromatic residues that have been shown to take part in a-coronavirus receptor-binding 

840 and putatively involved in 5-coronavirus receptor-binding are highlighted. Disulphide 

841 bonds that stabilise receptor binding loops are indicated and glycans within the domain 

842 are shown as sticks (carbon: green, nitrogen: blue and oxygen: red). 

843 

844 

845 Fig 5. Structural features of the PDCoV S 2 subunit. A, Ribbon representation of the 

846 PDCoV S trimer with the S 2 subunit core of one protomer colored from blue to red (from 

847 N-terminus to C-terminus). B, Zoomed-in view of the S 2 ’ activation loop region. Two 

32 


D 

o 


Downloaded from http://jvi.asm.org/ on November 3, 2017 by UNIV OF NEWCASTLE 


cfWology JVI Journal of Virology Accepted MdnU SC H pf Posted OfTlme 


848 glycans, linked to Asn-669 and Asn-673, that are strictly conserved in HCoV-NL63 S are 

849 shown as sticks (carbon: green, nitrogen: blue and oxygen: red). For comparison, the 

850 equivalent residues in the FICoV-NL63 S protein are indicated in gray. C, The PDCoV S 

851 glycoprotein features an insertion of 14 amino acid residues in HR1, compared to the (3- 

852 coronavirus MHV S protein, folding as an extended loop and an helical extension of two 

853 turns. The residues accounting for this HR1 insertion interact with the complementary 

854 insertion in HR2 in the post-fusion conformation (Fig S2 B). 

855 

856 

857 Fig 6. The PDCoV S glycoprotein is resistant to digestive enzymes. Purified SARS 

858 S (1 mg/ml) and PDCoV S (0.5 mg/ml) glycoproteins were incubated with 0.1 mg/ml 

859 trypsin or chymotrypsin for 2 hours at 22 °C. The digestion reactions were analyzed on 

860 a 12% SDS-PAGE gel. After incubation, the SARS S protein was extensively 

861 proteolyzed whereas a large fraction of the PDCoV S protein remains intact. 

862 

863 

864 

865 
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Table 1 Data collection and refinement statistics 


Data collection 


Number of particles 

455710 

Pixel size (A) 

1.33 

Voltage (kV) 

300 

Electron dose (e—/A 2 ) 

23.5 

Refinement 


Resolution (A) 

3.5 

Map-sharpening B factor (A 2 ) 

-150 

Model validation 


Favored rotamers (%) 

98 

Poor rotamers (%) 

0.35 

Ramachandran allowed (%) 

99.69 

Ramachandran outliers (%) 

0.31 

Clash score 

2.2 

MolProbity score 

1.27 
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